The original preregistrations for the studies contained both hypotheses and the specific analytic strategies that would be used to test them. However, these preregistrations did not include a meta-analytic strategy. Separately, a number of research questions/hypotheses were generated from exploration of the data from Experiments 1-6 that were not contained in the original preregistration, or where the specific analytic strategy to test them was poorly specified or more difficult to interpret. Separately, some methodological improvements were thought of after Experiments 1-6 was run (e.g., improved exclusion criteria to ensure participants stayed on the page where they watched/listened to the intervention in its entirety). We therefore elected to use the data from Experiments 1-6 to create this (non-preregistered) alternative analytic strategy that formalized our core research questions, hypotheses, analytic models, inference rules, and other researcher degrees of freedom. This analytic strategy (and code to implement it) will be preregistered for Experiment 7 which will provide strong confirmatory tests of these hypotheses.

Dependencies & functions

# dependencies
library(tidyverse)
library(knitr)
library(kableExtra)
library(brms)
library(parallel)
library(tidybayes)
library(bayestestR)
library(sjPlot)
library(psych)
library(rsample)
library(broom)
library(purrr)
library(IATscores)
library(lavaan)
library(semTools)

options(knitr.kable.NA = "/")

# set seed for bootstrapping reproducibility
set.seed(42)

# create necessary folder
dir.create("models")

Exclusions & standaridization

All dependent variables (self-reported evaluations and IAT D2 scores) were standardized (by 1 SD) after exclusions and prior to analysis condition (see Lorah, 2018: https://doi.org/10.1186/s40536-018-0061-2). This was done within each level of both IV (i.e., by Source Valence condition [positive vs. negative], and by Video Content [Genuine vs. Deepfaked]). As such, the beta estimates obtained from the Bayesian models (see research questions and data analysis plans below) therefore represent standardized beta values (\(\beta\) rather than \(B\)). More importantly, the nature of this standardization makes these estimates somewhat comparable to the frequentist standardized effect size metric Cohen’s \(d\), as both are a differences in (estimated) means as a proportion of SD although they should not be treated as equivalent. Effect size magnitude here can therefore be thought of along comparable scales as Cohen’s \(d\). As such, to aid interpretability, the point estimates of effect size will be reported as \(\delta\) (delta).

# full data
data_processed <- read.csv("../data/processed/4_data_participant_level_with_hand_scoring.csv") %>%
  # set factor levels for t test comparisons
  mutate(source_valence = fct_relevel(source_valence,
                                      "negative",
                                      "positive"),
         experiment_condition = fct_relevel(experiment_condition,
                                            "genuine",
                                            "deepfaked"),
         experiment = as.factor(experiment))

# apply exclusions
data_after_exclusions <- data_processed %>%
  filter(exclude_subject == FALSE & 
           exclude_implausible_intervention_linger == FALSE) %>%
  # standardize DVs by 1SD within each experiment and their conditions
  group_by(experiment, experiment_condition, source_valence) %>%
  mutate(mean_self_reported_evaluation = mean_self_reported_evaluation/sd(mean_self_reported_evaluation),
         IAT_D2 = IAT_D2/sd(IAT_D2),
         mean_intentions = mean_intentions/sd(mean_intentions)) %>%
  ungroup()

# item level for iat
data_iat_item_level_after_exclusions <- read_csv("../data/processed/2.4_data_iat_item_level.csv") %>%
  # exclude the same participants as above
  semi_join(rename(data_after_exclusions, subject_original = subject), by = "subject_original") 

Distributions

ggplot(data_after_exclusions, aes(mean_self_reported_evaluation, color = experiment)) +
  geom_density() +
  facet_wrap( ~ experiment_condition + source_valence) +
  ggtitle("Standardized scores")

ggplot(data_after_exclusions, aes(IAT_D2, color = experiment)) +
  geom_density() +
  facet_wrap( ~ experiment_condition + source_valence) +
  ggtitle("Standardized scores")

ggplot(data_after_exclusions, aes(mean_intentions, color = experiment)) +
  geom_density() +
  facet_wrap( ~ experiment_condition + source_valence) +
  ggtitle("Standardized scores")

Demographics

Pre exclussion

data_processed %>%
  group_by(experiment) %>%
  summarise(n = n(),
            excluded_n = sum(exclude_subject > 0 | exclude_implausible_intervention_linger > 0),
            excluded_percent = (excluded_n / n) *100) %>%
  mutate_if(is.numeric, round, digits = 1) %>%
  kable(align = "c") %>%
  kable_styling()
experiment n excluded_n excluded_percent
1 165 24 14.5
2 167 36 21.6
3 428 91 21.3
4 429 106 24.7
5 276 66 23.9
6 265 61 23.0

Post exclusions

data_after_exclusions %>%
  group_by(experiment) %>%
  summarise(n = n(),
            age_mean = mean(age, na.rm = TRUE),
            age_sd = sd(age, na.rm = TRUE)) %>%
  mutate_if(is.numeric, round, digits = 1) %>%
  kable(align = "c") %>%
  kable_styling()
experiment n age_mean age_sd
1 141 29.7 7.6
2 131 31.1 7.3
3 337 29.8 8.7
4 323 30.1 9.0
5 210 31.2 11.5
6 204 33.3 12.5
data_after_exclusions %>%
  count(experiment, gender) %>%
  spread(gender, n) %>%
  kable(knitr.kable.NA = "/", align = "c") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE)
experiment female male Non-binary other Prefer not to disclose
1 67 73 / 1 /
2 76 55 / / /
3 184 149 / 4 /
4 189 132 / 2 /
5 119 88 2 / 1
6 120 82 2 / /

Internal consistency

Self-reported evaluations

model_sr <- "scale =~ ratings_bad_good + ratings_dislike_like + ratings_negative_positive" 

fit_cfa_sr <- data_after_exclusions %>%
  cfa(model = model_sr, data = .) 

results_reliability_sr <- fit_cfa_sr %>%
  reliability() %>%
  as.data.frame() %>%
  rownames_to_column(var = "metric") %>%
  select(metric, estimate = scale) %>%
  filter(metric %in% c("alpha",
                       "omega2")) %>%
  mutate(metric = recode(metric,
                         "alpha" = "alpha",
                         "omega2" = "omega_t"),
         estimate = round(estimate, 3))

results_reliability_sr %>%
  kable(knitr.kable.NA = "/", align = "c") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE)
metric estimate
alpha 0.987
omega_t 0.987

IAT

split half

results_iat_split_half_reliability <- data_iat_item_level_after_exclusions %>%
  SplitHalf.D2(IATdata = .) %>%
  mutate(algorithm = ifelse(algorithm == "p2112", "D2", algorithm),
         splithalf = round(splithalf, 3))
## [1] "2020-11-19 00:00:45: Applying parameter P4 = dist"
## [1] "2020-11-19 00:00:45: Applying parameters P1 and P2"
## [1] "2020-11-19 00:00:45: Applying parameter P3 = dscore"
## [1] "2020-11-19 00:00:46: Applying parameters P1 and P2"
## [1] "2020-11-19 00:00:46: Applying parameter P3 = dscore"
## [1] "2020-11-19 00:00:47: IAT scores have been computed"
## [1] "2020-11-19 00:00:47: Applying parameter P4 = dist"
## [1] "2020-11-19 00:00:47: Applying parameters P1 and P2"
## [1] "2020-11-19 00:00:47: Applying parameter P3 = dscore"
## [1] "2020-11-19 00:00:48: Applying parameters P1 and P2"
## [1] "2020-11-19 00:00:48: Applying parameter P3 = dscore"
## [1] "2020-11-19 00:00:49: IAT scores have been computed"
results_iat_split_half_reliability %>%
  kable(knitr.kable.NA = "/", align = "c") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE)
algorithm splithalf
D2 0.839

Behavioral intentions

model_bi <- "scale =~ behavioral_intentions_share + behavioral_intentions_subscribe + behavioral_intentions_recommend" 

fit_cfa_bi <- data_after_exclusions %>%
  cfa(model = model_bi, data = .) 

results_reliability_bi <- fit_cfa_bi %>%
  reliability() %>%
  as.data.frame() %>%
  rownames_to_column(var = "metric") %>%
  select(metric, estimate = scale) %>%
  filter(metric %in% c("alpha",
                       "omega2")) %>%
  mutate(metric = recode(metric,
                         "alpha" = "alpha",
                         "omega2" = "omega_t"),
         estimate = round(estimate, 3))

results_reliability_bi %>%
  kable(knitr.kable.NA = "/", align = "c") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE)
metric estimate
alpha 0.941
omega_t 0.941

Can you establish evaluations via genuine and Deepfaked online content?

  • Analyses employ Bayesian multilevel models with experiment employed as a random (Group level) intercept, and source_valence, experiment_condition and their interaction as IVs. This could therefore be described as akin to a Bayesian multilevel ANOVA.
  • DVs were standardize as noted above, and as such fitted model estimates represent standardized beta values (which due to the specifics of the standardization have comparable [but not exact] interpretation as Cohen’s d values).
  • Bayesian p values are also reported: these are on a similar scale to frequentist p values, but technically are 1 minus the posterior probability that the effect is greater than 0, i.e., \(1 - P(\beta>0)\).
  • Inspection of the posterior distributions allow us to infer that we employed weak priors placed on all parameters (normal distribution with M = 0 and SD = 10). Inspection of the chains indicated good convergence in all cases.

Sample sizes

data_after_exclusions %>%
  select(source_valence, 
         experiment_condition) %>%
  drop_na() %>%
  count(experiment_condition,
        source_valence) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE)
experiment_condition source_valence n
genuine negative 382
genuine positive 425
deepfaked negative 257
deepfaked positive 282

Self-reported evaluations

Fit model

fit_selfreport <-
  brm(formula = mean_self_reported_evaluation ~ source_valence * experiment_condition + (1 | experiment),
      family = gaussian(),
      data    = data_after_exclusions,
      file    = "models/fit_selfreport",
      prior   = prior(normal(0, 10)),
      iter    = 10000,
      warmup  = 3000,
      control = list(adapt_delta = 0.99),  # to avoid divergent transitions
      chains  = 4,
      cores   = parallel::detectCores())

Inspect convergence

summary(fit_selfreport)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: mean_self_reported_evaluation ~ source_valence * experiment_condition + (1 | experiment) 
##    Data: data_after_exclusions (Number of observations: 1346) 
## Samples: 4 chains, each with iter = 10000; warmup = 3000; thin = 1;
##          total post-warmup samples = 28000
## 
## Group-Level Effects: 
## ~experiment (Number of levels: 6) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.18      0.11     0.06     0.46 1.00     5768     9514
## 
## Population-Level Effects: 
##                                                      Estimate Est.Error
## Intercept                                               -1.48      0.10
## source_valencepositive                                   2.70      0.07
## experiment_conditiondeepfaked                            0.08      0.09
## source_valencepositive:experiment_conditiondeepfaked     0.01      0.12
##                                                      l-95% CI u-95% CI Rhat
## Intercept                                               -1.68    -1.28 1.00
## source_valencepositive                                   2.56     2.85 1.00
## experiment_conditiondeepfaked                           -0.09     0.25 1.00
## source_valencepositive:experiment_conditiondeepfaked    -0.22     0.24 1.00
##                                                      Bulk_ESS Tail_ESS
## Intercept                                                8855    10218
## source_valencepositive                                  16019    18209
## experiment_conditiondeepfaked                           14109    17038
## source_valencepositive:experiment_conditiondeepfaked    13233    16420
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.03      0.02     1.00     1.07 1.00    22065    19280
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
plot(fit_selfreport, ask = FALSE)

Check informativeness of prior

Using Gelman’s (2019) simple heuristic: For each parameter, compare the posterior sd to the prior sd. If the posterior sd for any parameter is more than 0.1 times the prior sd, then note that the prior was informative.

check_prior(fit_selfreport) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE)
Parameter Prior_Quality
b_Intercept uninformative
b_source_valencepositive uninformative
b_experiment_conditiondeepfaked uninformative
b_source_valencepositive.experiment_conditiondeepfaked uninformative

Interpret posteriors

# plot_model(fit_selfreport)
plot_model(fit_selfreport, type = "pred", terms = c("source_valence", "experiment_condition"))

# percent moderation
draws_sr <-
  bind_cols(
    select(spread_draws(fit_selfreport, b_source_valencepositive), b_source_valencepositive),
    select(spread_draws(fit_selfreport, b_experiment_conditiondeepfaked), b_experiment_conditiondeepfaked),
    select(spread_draws(fit_selfreport, `b_source_valencepositive:experiment_conditiondeepfaked`), `b_source_valencepositive:experiment_conditiondeepfaked`)
  ) %>%
  rename(main_valence = b_source_valencepositive,
         main_experiment_condition = b_experiment_conditiondeepfaked,
         interaction = `b_source_valencepositive:experiment_conditiondeepfaked`) %>%
  mutate(effect_genuine = main_valence,
         effect_deepfaked = main_valence + main_experiment_condition + interaction,
         #percent_moderation = (main_experiment_condition + interaction)/main_valence *100,  # alt method, same result
         percent_comparison = (effect_deepfaked/effect_genuine)*100)

# results
estimates_sr <-
  map_estimate(draws_sr) %>%
  full_join(bayestestR::hdi(draws_sr, ci = .95) %>%
              rename(CI_95_lower = CI_low,
                     CI_95_upper = CI_high) %>%
              as_tibble(),
            by = "Parameter") %>%
  full_join(bayestestR::hdi(draws_sr, ci = .90) %>%
              as_tibble() %>%
              rename(CI_90_lower = CI_low,
                     CI_90_upper = CI_high),
            by = "Parameter") %>%
  full_join(draws_sr %>%
              select(-percent_comparison) %>%
              gather(Parameter, value) %>%
              group_by(Parameter) %>%
              summarize(pd = mean(value > 0)) %>%
              mutate(p = pd_to_p(pd, direction = "one-sided")) %>%
              ungroup() %>%
              select(Parameter, p),
            by = "Parameter") %>%
  select(Parameter, MAP_Estimate, CI_95_lower, CI_95_upper,
         CI_90_lower, CI_90_upper, p)

# results table
estimates_sr %>%
  mutate_at(.vars = c("MAP_Estimate", "CI_95_lower", "CI_95_upper", "CI_90_lower", "CI_90_upper"), round, digits = 2) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE)
Parameter MAP_Estimate CI_95_lower CI_95_upper CI_90_lower CI_90_upper p
main_valence 2.70 2.56 2.85 2.58 2.82 0.0000000
main_experiment_condition 0.08 -0.09 0.25 -0.07 0.22 0.1813214
interaction 0.01 -0.22 0.23 -0.18 0.20 0.4719643
effect_genuine 2.70 2.56 2.85 2.58 2.82 0.0000000
effect_deepfaked 2.78 2.63 2.95 2.66 2.93 0.0000000
percent_comparison 103.32 97.29 109.36 98.05 108.21 /
# hypothesis testing
H1a <- ifelse((estimates_sr %>% filter(Parameter == "effect_genuine") %>% pull(CI_95_lower)) > 0, 
              "Accepted", "Rejected")

H1b <- ifelse((estimates_sr %>% filter(Parameter == "effect_deepfaked") %>% pull(CI_95_lower)) > 0, 
              "Accepted", "Rejected")

H2a <- ifelse((estimates_sr %>% filter(Parameter == "effect_deepfaked") %>% pull(CI_90_lower)) > 
                (estimates_sr %>% filter(Parameter == "effect_genuine") %>% pull(CI_95_lower)), 
              "Accepted", "Rejected")

comparison_string_sr <-
  paste0("Deepfakes are ",
         estimates_sr %>% filter(Parameter == "percent_comparison") %>% pull(MAP_Estimate) %>% round(1),
         "% (95% CI [",
         estimates_sr %>% filter(Parameter == "percent_comparison") %>% pull(CI_95_lower) %>% round(1),
         ", ",
         estimates_sr %>% filter(Parameter == "percent_comparison") %>% pull(CI_95_upper) %>% round(1),
         "]) as effective as genuine content in establishing self-reported evaluations")

H1a

The content of the genuine videos (i.e., Source Valence) will influence participants’ self-reported evaluations. Specifically, we will use a Bayesian linear model (model 1) to estimate a 95% Confidence Interval on standardized effect size change in self-reported evaluations between Source Valence conditions in the genuine video condition subgroup. Confidence Intervals whose lower bounds are > 0 will be considered evidence in support of this hypothesis.

  • Result: Accepted

H1b

The content of the Deepfaked videos (i.e., Source Valence) will influence participants’ self-reported evaluations. Specifically, we will use a Bayesian linear model (model 1) to estimate a 95% Confidence Interval on standardized effect size change in self-reported evaluations between Source Valence conditions in the Deepfaked video condition subgroup. Confidence Intervals whose lower bounds are > 0 will be considered evidence in support of this hypothesis.

  • Result: Accepted

H2a

Change in self-reported evaluations (i.e., between Source Valence conditions) induced by Deepfaked video content will be non-inferior to genuine content.

  • Result: Accepted. Deepfakes are 103.3% (95% CI [97.3, 109.4]) as effective as genuine content in establishing self-reported evaluations.

Implicit

Fit model

fit_implicit <-
  brm(formula = IAT_D2 ~ source_valence * experiment_condition + (1 | experiment),
      family = gaussian(),
      data    = data_after_exclusions,
      file    = "models/fit_implicit",
      prior   = prior(normal(0, 10)),
      iter    = 10000,
      warmup  = 3000,
      control = list(adapt_delta = 0.99),  # to avoid divergent transitions
      chains  = 4,
      cores   = parallel::detectCores())

Inspect convergence

summary(fit_implicit)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: IAT_D2 ~ source_valence * experiment_condition + (1 | experiment) 
##    Data: data_after_exclusions (Number of observations: 1346) 
## Samples: 4 chains, each with iter = 10000; warmup = 3000; thin = 1;
##          total post-warmup samples = 28000
## 
## Group-Level Effects: 
## ~experiment (Number of levels: 6) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.20      0.11     0.07     0.49 1.00     6102     9811
## 
## Population-Level Effects: 
##                                                      Estimate Est.Error
## Intercept                                                0.05      0.11
## source_valencepositive                                   1.32      0.07
## experiment_conditiondeepfaked                            0.07      0.08
## source_valencepositive:experiment_conditiondeepfaked     0.00      0.11
##                                                      l-95% CI u-95% CI Rhat
## Intercept                                               -0.17     0.26 1.00
## source_valencepositive                                   1.19     1.46 1.00
## experiment_conditiondeepfaked                           -0.09     0.23 1.00
## source_valencepositive:experiment_conditiondeepfaked    -0.21     0.22 1.00
##                                                      Bulk_ESS Tail_ESS
## Intercept                                                7763     9240
## source_valencepositive                                  16451    18513
## experiment_conditiondeepfaked                           14981    18192
## source_valencepositive:experiment_conditiondeepfaked    14243    17669
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.00      0.02     0.97     1.04 1.00    22279    19314
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
plot(fit_implicit, ask = FALSE)

Check informativeness of prior

Using Gelman’s (2019) simple heuristic: For each parameter, compare the posterior sd to the prior sd. If the posterior sd for any parameter is more than 0.1 times the prior sd, then note that the prior was informative.

check_prior(fit_implicit) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE)
Parameter Prior_Quality
b_Intercept uninformative
b_source_valencepositive uninformative
b_experiment_conditiondeepfaked uninformative
b_source_valencepositive.experiment_conditiondeepfaked uninformative

Interpret posteriors

#plot_model(fit_implicit)
plot_model(fit_implicit, type = "pred", terms = c("source_valence", "experiment_condition"))

# percent moderation
draws_imp <-
  bind_cols(
    select(spread_draws(fit_implicit, b_source_valencepositive), b_source_valencepositive),
    select(spread_draws(fit_implicit, b_experiment_conditiondeepfaked), b_experiment_conditiondeepfaked),
    select(spread_draws(fit_implicit, `b_source_valencepositive:experiment_conditiondeepfaked`), `b_source_valencepositive:experiment_conditiondeepfaked`)
  ) %>%
  rename(main_valence = b_source_valencepositive,
         main_experiment_condition = b_experiment_conditiondeepfaked,
         interaction = `b_source_valencepositive:experiment_conditiondeepfaked`) %>%
  mutate(effect_genuine = main_valence,
         effect_deepfaked = main_valence + main_experiment_condition + interaction,
         #percent_moderation = (main_experiment_condition + interaction)/main_valence *100,  # alt method, same result
         percent_comparison = (effect_deepfaked/effect_genuine)*100)

# results table
estimates_imp <-
  map_estimate(draws_imp) %>%
  full_join(bayestestR::hdi(draws_imp, ci = .95) %>%
              rename(CI_95_lower = CI_low,
                     CI_95_upper = CI_high) %>%
              as_tibble(),
            by = "Parameter") %>%
  full_join(bayestestR::hdi(draws_imp, ci = .90) %>%
              as_tibble() %>%
              rename(CI_90_lower = CI_low,
                     CI_90_upper = CI_high),
            by = "Parameter") %>%
  full_join(draws_imp %>%
              select(-percent_comparison) %>%
              gather(Parameter, value) %>%
              group_by(Parameter) %>%
              summarize(pd = mean(value > 0)) %>%
              mutate(p = pd_to_p(pd, direction = "one-sided")) %>%
              ungroup() %>%
              select(Parameter, p),
            by = "Parameter") %>%
  select(Parameter, MAP_Estimate, CI_95_lower, CI_95_upper,
         CI_90_lower, CI_90_upper, p)

estimates_imp %>%
  mutate_at(.vars = c("MAP_Estimate", "CI_95_lower", "CI_95_upper", "CI_90_lower", "CI_90_upper"), round, digits = 2) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE)
Parameter MAP_Estimate CI_95_lower CI_95_upper CI_90_lower CI_90_upper p
main_valence 1.32 1.18 1.46 1.21 1.44 0.0000000
main_experiment_condition 0.06 -0.10 0.23 -0.07 0.20 0.2096429
interaction 0.00 -0.21 0.22 -0.18 0.18 0.4891071
effect_genuine 1.32 1.18 1.46 1.21 1.44 0.0000000
effect_deepfaked 1.39 1.24 1.55 1.26 1.52 0.0000000
percent_comparison 104.95 93.46 117.85 95.01 115.54 /
# hypothesis testing
H1c <- ifelse((estimates_imp %>% filter(Parameter == "effect_genuine") %>% pull(CI_95_lower)) > 0, 
              "Accepted", "Rejected")

H1d <- ifelse((estimates_imp %>% filter(Parameter == "effect_deepfaked") %>% pull(CI_95_lower)) > 0, 
              "Accepted", "Rejected")

H2b <- ifelse((estimates_imp %>% filter(Parameter == "effect_deepfaked") %>% pull(CI_90_lower)) > 
                (estimates_imp %>% filter(Parameter == "effect_genuine") %>% pull(CI_95_lower)), 
              "Accepted", "Rejected")

comparison_string_imp <-
  paste0("Deepfakes are ",
         estimates_imp %>% filter(Parameter == "percent_comparison") %>% pull(MAP_Estimate) %>% round(1), 
         "% (95% CI [",
         estimates_imp %>% filter(Parameter == "percent_comparison") %>% pull(CI_95_lower) %>% round(1),
         ", ",
         estimates_imp %>% filter(Parameter == "percent_comparison") %>% pull(CI_95_upper) %>% round(1),
         "]) as effective as genuine content in establishing self-reported evaluations")

H1c

The content of the genuine videos (i.e., Source Valence) will influence participants’ IAT D2 scores. Specifically, we will use a Bayesian linear model (model 2) to estimate a 95% Confidence Interval on standardized effect size change in IAT D2 scores between Source Valence conditions in the genuine video condition subgroup. Confidence Intervals whose lower bounds are > 0 will be considered evidence in support of this hypothesis.

  • Result: Accepted

H1d

The content of the Deepfaked videos (i.e., Source Valence) will influence participants’ IAT D2 scores. Specifically, we will use a Bayesian linear model (model 2) to estimate a 95% Confidence Interval on standardized effect size change in IAT D2 scores between Source Valence conditions in the Deepfaked video condition subgroup. Confidence Intervals whose lower bounds are > 0 will be considered evidence in support of this hypothesis.

  • Result: Accepted

H2b

Change in IAT D2 scores (i.e., between Source Valence conditions) induced by Deepfaked video content will be non-inferior to genuine content.

  • Result: Accepted. Deepfakes are 105% (95% CI [93.5, 117.9]) as effective as genuine content in establishing self-reported evaluations.

Behavioural intentions

Fit model

fit_intentions <-
  brm(formula = mean_intentions ~ source_valence * experiment_condition, # no random effect for experiment as only exp 6 assessed intentions
      family = gaussian(),
      data    = data_after_exclusions,
      file    = "models/fit_intentions",
      prior   = prior(normal(0, 10)),
      iter    = 10000,
      warmup  = 3000,
      control = list(adapt_delta = 0.99),  # to avoid divergent transitions
      chains  = 4,
      cores   = parallel::detectCores())

Inspect convergence

summary(fit_intentions)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: mean_intentions ~ source_valence * experiment_condition 
##    Data: data_after_exclusions (Number of observations: 204) 
## Samples: 4 chains, each with iter = 10000; warmup = 3000; thin = 1;
##          total post-warmup samples = 28000
## 
## Population-Level Effects: 
##                                                      Estimate Est.Error
## Intercept                                               -1.64      0.14
## source_valencepositive                                   1.12      0.20
## experiment_conditiondeepfaked                           -1.70      0.21
## source_valencepositive:experiment_conditiondeepfaked     1.95      0.28
##                                                      l-95% CI u-95% CI Rhat
## Intercept                                               -1.92    -1.35 1.00
## source_valencepositive                                   0.72     1.52 1.00
## experiment_conditiondeepfaked                           -2.10    -1.29 1.00
## source_valencepositive:experiment_conditiondeepfaked     1.39     2.50 1.00
##                                                      Bulk_ESS Tail_ESS
## Intercept                                               15183    17895
## source_valencepositive                                  13178    15992
## experiment_conditiondeepfaked                           13001    16122
## source_valencepositive:experiment_conditiondeepfaked    11385    14004
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.01      0.05     0.91     1.11 1.00    18637    17174
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
plot(fit_intentions, ask = FALSE)

Check informativeness of prior

Using Gelman’s (2019) simple heuristic: For each parameter, compare the posterior sd to the prior sd. If the posterior sd for any parameter is more than 0.1 times the prior sd, then note that the prior was informative.

check_prior(fit_intentions) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE)
Parameter Prior_Quality
b_Intercept uninformative
b_source_valencepositive uninformative
b_experiment_conditiondeepfaked uninformative
b_source_valencepositive.experiment_conditiondeepfaked uninformative

Interpret posteriors

#plot_model(fit_intentions)
plot_model(fit_intentions, type = "pred", terms = c("source_valence", "experiment_condition"))

# percent moderation
draws_intentions <-
  bind_cols(
    select(spread_draws(fit_intentions, b_source_valencepositive), b_source_valencepositive),
    select(spread_draws(fit_intentions, b_experiment_conditiondeepfaked), b_experiment_conditiondeepfaked),
    select(spread_draws(fit_intentions, `b_source_valencepositive:experiment_conditiondeepfaked`), `b_source_valencepositive:experiment_conditiondeepfaked`)
  ) %>%
  rename(main_valence = b_source_valencepositive,
         main_experiment_condition = b_experiment_conditiondeepfaked,
         interaction = `b_source_valencepositive:experiment_conditiondeepfaked`) %>%
  mutate(effect_genuine = main_valence,
         effect_deepfaked = main_valence + main_experiment_condition + interaction,
         #percent_moderation = (main_experiment_condition + interaction)/main_valence *100,  # alt method, same result
         percent_comparison = (effect_deepfaked/effect_genuine)*100)

# results table
estimates_intentions <-
  map_estimate(draws_intentions) %>%
  full_join(bayestestR::hdi(draws_intentions, ci = .95) %>%
              rename(CI_95_lower = CI_low,
                     CI_95_upper = CI_high) %>%
              as_tibble(),
            by = "Parameter") %>%
  full_join(bayestestR::hdi(draws_intentions, ci = .90) %>%
              as_tibble() %>%
              rename(CI_90_lower = CI_low,
                     CI_90_upper = CI_high),
            by = "Parameter") %>%
  full_join(draws_intentions %>%
              select(-percent_comparison) %>%
              gather(Parameter, value) %>%
              group_by(Parameter) %>%
              summarize(pd = mean(value > 0)) %>%
              mutate(p = pd_to_p(pd, direction = "one-sided")) %>%
              ungroup() %>%
              select(Parameter, p),
            by = "Parameter") %>%
  select(Parameter, MAP_Estimate, CI_95_lower, CI_95_upper,
         CI_90_lower, CI_90_upper, p)

estimates_intentions %>%
  mutate_at(.vars = c("MAP_Estimate", "CI_95_lower", "CI_95_upper", "CI_90_lower", "CI_90_upper"), round, digits = 2) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE)
Parameter MAP_Estimate CI_95_lower CI_95_upper CI_90_lower CI_90_upper p
main_valence 1.10 0.73 1.53 0.78 1.44 0
main_experiment_condition -1.68 -2.11 -1.30 -2.03 -1.35 0
interaction 1.97 1.41 2.52 1.48 2.41 0
effect_genuine 1.10 0.73 1.53 0.78 1.44 0
effect_deepfaked 1.37 0.99 1.76 1.04 1.69 0
percent_comparison 118.58 87.52 169.62 90.93 158.04 /
# hypothesis testing
H1e <- ifelse((estimates_intentions %>% filter(Parameter == "effect_genuine") %>% pull(CI_95_lower)) > 0, 
              "Accepted", "Rejected")

H1f <- ifelse((estimates_intentions %>% filter(Parameter == "effect_deepfaked") %>% pull(CI_95_lower)) > 0, 
              "Accepted", "Rejected")

H2c <- ifelse((estimates_intentions %>% filter(Parameter == "effect_deepfaked") %>% pull(CI_90_lower)) > 
                (estimates_intentions %>% filter(Parameter == "effect_genuine") %>% pull(CI_95_lower)), 
              "Accepted", "Rejected")

comparison_string_intentions <-
  paste0("Deepfakes are ",
         estimates_intentions %>% filter(Parameter == "percent_comparison") %>% pull(MAP_Estimate) %>% round(1), 
         "% (95% CI [",
         estimates_intentions %>% filter(Parameter == "percent_comparison") %>% pull(CI_95_lower) %>% round(1),
         ", ",
         estimates_intentions %>% filter(Parameter == "percent_comparison") %>% pull(CI_95_upper) %>% round(1),
         "]) as effective as genuine content in establishing self-reported evaluations")

H1e

The content of the genuine videos (i.e., Source Valence) will influence participants’ behavioral intention responses. Specifically, we will use a Bayesian linear model (model 2) to estimate a 95% Confidence Interval on standardized effect size change in behavioral intention scores between Source Valence conditions in the genuine video condition subgroup. Confidence Intervals whose lower bounds are > 0 will be considered evidence in support of this hypothesis.

  • Result: Accepted

H1f

The content of the Deepfaked videos (i.e., Source Valence) will influence participants’ behavioral intention responses. Specifically, we will use a Bayesian linear model (model 2) to estimate a 95% Confidence Interval on standardized effect size change in behavioral intention scores between Source Valence conditions in the Deepfaked video condition subgroup. Confidence Intervals whose lower bounds are > 0 will be considered evidence in support of this hypothesis.

  • Result: Accepted

H2c

Change in behavioral intentions (i.e., between Source Valence conditions) induced by Deepfaked video content will be non-inferior to genuine content.

  • Result: Accepted. Deepfakes are 118.6% (95% CI [87.5, 169.6]) as effective as genuine content in establishing self-reported evaluations.

How good are people at detecting deepfakes?

Inter-rater relibility

data_after_exclusions %>%
  count(deepfake_detected,
        deepfake_detected_rater_1,
        deepfake_detected_rater_2) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE)
deepfake_detected deepfake_detected_rater_1 deepfake_detected_rater_2 n
FALSE FALSE FALSE 414
FALSE FALSE TRUE 16
FALSE TRUE FALSE 31
TRUE TRUE TRUE 115
/ / / 770
data_after_exclusions %>%
  summarize(percent_agreement = round(mean(deepfake_detected_rater_1 == deepfake_detected_rater_2, na.rm = TRUE)*100, 1)) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE)
percent_agreement
91.8
data_after_exclusions %>%
  select(deepfake_detected_rater_1,                                   
         deepfake_detected_rater_2) %>%
  as.data.frame() %>%  # kappa function won't take tibbles
  psych::cohen.kappa(.)
## Call: cohen.kappa1(x = x, w = w, n.obs = n.obs, alpha = alpha, levels = levels)
## 
## Cohen Kappa and Weighted Kappa correlation coefficients and confidence boundaries 
##                  lower estimate upper
## unweighted kappa  0.72     0.78  0.84
## weighted kappa    0.72     0.78  0.84
## 
##  Number of subjects = 576

Interpretation of Kappa (Altman 1999, Landis JR, 1977):

  • 0.61 - 0.80 Substantial
  • 0.81 - 1.00 Almost perfect

Can people accurately detect deepfakes?

  • Youden’s J = sensitivity + specificity - 1, aka informedness, aka “the probability of an informed decision (as opposed to a random guess) and takes into account all predictions”
  • 95% CIs were bootstrapped via case removal and the percentile method.

Sample size

data_after_exclusions %>%
  count(experiment_condition,
        deepfake_detected) %>%
  drop_na() %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE)
experiment_condition deepfake_detected n
genuine FALSE 188
genuine TRUE 16
deepfaked FALSE 273
deepfaked TRUE 99

Bootstrapped classification stats

if(file.exists("models/fit_classification_bootstraps.rds")){

  fit_classification_bootstraps <- read_rds("models/fit_classification_bootstraps.rds")

} else {

  # create bootstraps using out of bag method. makes a df with values that are collapsed dfs.
  boots <- data_after_exclusions %>%
    select(experiment_condition, deepfake_detected) %>%
    drop_na() %>%
    bootstraps(times = 2000)

  # generalize to a summarize function ------
  bootstrap_categorization_stats <- function(split) {

    data_counts <- analysis(split) %>%
      count(experiment_condition, deepfake_detected)

    TP <- pull(filter(data_counts, experiment_condition == "deepfaked" & deepfake_detected == TRUE),  n)
    FP <- pull(filter(data_counts, experiment_condition == "genuine"   & deepfake_detected == TRUE),  n)
    FN <- pull(filter(data_counts, experiment_condition == "deepfaked" & deepfake_detected == FALSE), n)
    TN <- pull(filter(data_counts, experiment_condition == "genuine"   & deepfake_detected == FALSE), n)

    #accuracy <- (TP+TN)/(TP+TN+FP+FN)
    sensitivity <- TP / (TP+FN)
    false_negative_rate <- 1 - sensitivity
    specificity <- TN / (TN+FP)
    false_positive_rate <- 1 - specificity

    # Youden's J statistic aka informedness aka "the probability of an informed decision (as opposed to a random guess) and takes into account all predictions". a zero value when a diagnostic test gives the same proportion of positive results for groups with and without the disease, i.e the test is useless.
    informedness <- sensitivity + specificity - 1

    balanced_accuracy <- (sensitivity + specificity)/2

    results <-
      tibble(variable = c(
        #"accuracy",
        "balanced_accuracy",
        "informedness",
        #"sensitivity",
        "false_negative_rate",
        #"specificity",
        "false_positive_rate"
      ),
      value = c(
        #accuracy,
        balanced_accuracy,
        informedness,
        #sensitivity,
        false_negative_rate,
        #specificity,
        false_positive_rate
      ))

    return(results)
  }

  # apply to each bootstrap
  fit_classification_bootstraps <- boots %>%
    mutate(categorization_stats = map(splits, bootstrap_categorization_stats)) %>%
    select(-splits) %>%
    unnest(categorization_stats)

  write_rds(fit_classification_bootstraps, "models/fit_classification_bootstraps.rds")

}

fit_classification_bootstraps %>%
  group_by(variable) %>%
  summarize(median = quantile(value, 0.500),
            ci_lower = quantile(value, 0.025),
            ci_upper = quantile(value, 0.975),
            .groups = "drop") %>%
  mutate_if(is.numeric, round, digits = 2) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE)
variable median ci_lower ci_upper
balanced_accuracy 0.59 0.56 0.62
false_negative_rate 0.73 0.69 0.78
false_positive_rate 0.08 0.04 0.12
informedness 0.19 0.13 0.25

H3: Participants are poor at making accurate and informed judgements about whether online video content is genuine or Deepfaked. Our predictions here are descriptive/continuous rather than involving cut-off based inference rules.

H3a

We expect participants to be poor at correctly detecting Deepfakes (i.e., demonstrate a high false negative rate, FNR ≳ .80).

H3b

We expect participants to incorrectly detect Deepfakes even when the video content was real (i.e., demonstrate a high false positive rate, FPR ≳ .05).

H3c

We expect participants to be poor at making accurate decisions about whether content is genuine or not (i.e., balanced accuracy not greatly above chance, ≲ .60).

H3d

We expect participants to make poorly informed decisions about whether content is genuine or not (i.e., informedness/Youden’s J ≲ .25).

Even the subset of participants who were aware of the concept of deepfakes before the study?

Sample size

data_after_exclusions %>%
  filter(deepfake_concept_check == TRUE) %>%
  count(experiment_condition,
        deepfake_detected) %>%
  drop_na() %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE)
experiment_condition deepfake_detected n
genuine FALSE 96
genuine TRUE 10
deepfaked FALSE 82
deepfaked TRUE 32

Bootstrapped classification stats

if(file.exists("models/fit_classification_bootstraps_subset.rds")){

  fit_classification_bootstraps_subset <- read_rds("models/fit_classification_bootstraps_subset.rds")

} else {

  # create bootstraps using out of bag method. makes a df with values that are collapsed dfs.
  boots <- data_after_exclusions %>%
    filter(deepfake_concept_check == TRUE) %>%
    select(experiment_condition, deepfake_detected) %>%
    drop_na() %>%
    bootstraps(times = 2000)

  # generalize to a summarize function ------
  bootstrap_categorization_stats <- function(split) {

    data_counts <- analysis(split) %>%
      count(experiment_condition, deepfake_detected)

    TP <- pull(filter(data_counts, experiment_condition == "deepfaked" & deepfake_detected == TRUE),  n)
    FP <- pull(filter(data_counts, experiment_condition == "genuine"   & deepfake_detected == TRUE),  n)
    FN <- pull(filter(data_counts, experiment_condition == "deepfaked" & deepfake_detected == FALSE), n)
    TN <- pull(filter(data_counts, experiment_condition == "genuine"   & deepfake_detected == FALSE), n)

    #accuracy <- (TP+TN)/(TP+TN+FP+FN)
    sensitivity <- TP / (TP+FN)
    false_negative_rate <- 1 - sensitivity
    specificity <- TN / (TN+FP)
    false_positive_rate <- 1 - specificity

    # Youden's J statistic aka informedness aka "the probability of an informed decision (as opposed to a random guess) and takes into account all predictions". a zero value when a diagnostic test gives the same proportion of positive results for groups with and without the disease, i.e the test is useless.
    informedness <- sensitivity + specificity - 1

    balanced_accuracy <- (sensitivity + specificity)/2

    results <-
      tibble(variable = c(
        #"accuracy",
        "balanced_accuracy",
        "informedness",
        #"sensitivity",
        "false_negative_rate",
        #"specificity",
        "false_positive_rate"
      ),
      value = c(
        #accuracy,
        balanced_accuracy,
        informedness,
        #sensitivity,
        false_negative_rate,
        #specificity,
        false_positive_rate
      ))

    return(results)
  }

  # apply to each bootstrap
  fit_classification_bootstraps_subset <- boots %>%
    mutate(categorization_stats = map(splits, bootstrap_categorization_stats)) %>%
    select(-splits) %>%
    unnest(categorization_stats)

  write_rds(fit_classification_bootstraps_subset, "models/fit_classification_bootstraps_subset.rds")

}

fit_classification_bootstraps_subset %>%
  group_by(variable) %>%
  summarize(median = quantile(value, 0.500),
            ci_lower = quantile(value, 0.025),
            ci_upper = quantile(value, 0.975),
            .groups = "drop") %>%
  mutate_if(is.numeric, round, digits = 2) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE)
variable median ci_lower ci_upper
balanced_accuracy 0.59 0.54 0.64
false_negative_rate 0.72 0.63 0.80
false_positive_rate 0.09 0.04 0.15
informedness 0.19 0.09 0.28

Does detecting a deepfake make you immune to it?

Subset who received deepfaked videos but also detected them. Same Bayesian multilevel models as employed above, using only source_valence as IV, i.e., to detect whether learning effects are credibly non-zero in this subset.

Sample sizes

data_detectors_subset <- data_after_exclusions %>%
  filter(experiment_condition == "deepfaked" & deepfake_detected == TRUE)

data_detectors_subset %>%
  count(source_valence) %>%
  rename(n_experiments_4_to_6 = n) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE)
source_valence n_experiments_4_to_6
negative 51
positive 48
data_detectors_subset %>%
  filter(experiment == 6) %>%
  count(source_valence) %>%
  rename(n_experiment_6 = n) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE)
source_valence n_experiment_6
negative 14
positive 10

Intentions DV present only in exp 6. N is particularly low for this analysis, so results should be taken with additional caution.

Self-reported evaluations

Fit model

fit_selfreport_deepfaked_detected <-
  brm(formula = mean_self_reported_evaluation ~ source_valence + (1 | experiment),
      family  = gaussian(),
      data    = data_detectors_subset,
      file    = "models/fit_selfreport_deepfaked_detected",
      prior   = prior(normal(0, 10)),
      iter    = 10000,
      warmup  = 3000,
      control = list(adapt_delta = 0.99),  # to avoid divergent transitions
      chains  = 4,
      cores   = parallel::detectCores())

Inspect convergence

summary(fit_selfreport_deepfaked_detected)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: mean_self_reported_evaluation ~ source_valence + (1 | experiment) 
##    Data: data_detectors_subset (Number of observations: 99) 
## Samples: 4 chains, each with iter = 10000; warmup = 3000; thin = 1;
##          total post-warmup samples = 28000
## 
## Group-Level Effects: 
## ~experiment (Number of levels: 3) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.44      0.55     0.01     1.96 1.00     5085     4924
## 
## Population-Level Effects: 
##                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept                 -1.60      0.37    -2.40    -0.85 1.00     6252
## source_valencepositive     2.75      0.24     2.28     3.22 1.00    17481
##                        Tail_ESS
## Intercept                  5208
## source_valencepositive    15763
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.17      0.09     1.02     1.36 1.00    16377    16890
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
plot(fit_selfreport_deepfaked_detected, ask = FALSE)

Check informativeness of prior

Using Gelman’s (2019) simple heuristic: For each parameter, compare the posterior sd to the prior sd. If the posterior sd for any parameter is more than 0.1 times the prior sd, then note that the prior was informative.

check_prior(fit_selfreport_deepfaked_detected) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE)
Parameter Prior_Quality
b_Intercept uninformative
b_source_valencepositive uninformative

Interpret posteriors

#plot_model(fit_selfreport_deepfaked_detected)
plot_model(fit_selfreport_deepfaked_detected, type = "pred", terms = "source_valence")

# results table
draws_sr_deepfaked_detected <-
  select(spread_draws(fit_selfreport_deepfaked_detected, b_source_valencepositive), b_source_valencepositive) %>%
  rename(effect_deepfaked_detected = b_source_valencepositive)

estimates_sr_deepfaked_detected <-
  map_estimate(draws_sr_deepfaked_detected) %>%
  full_join(bayestestR::hdi(draws_sr_deepfaked_detected, ci = .95) %>%
              rename(CI_95_lower = CI_low,
                     CI_95_upper = CI_high) %>%
              as_tibble(),
            by = "Parameter") %>%
  full_join(bayestestR::hdi(draws_sr_deepfaked_detected, ci = .90) %>%
              as_tibble() %>%
              rename(CI_90_lower = CI_low,
                     CI_90_upper = CI_high),
            by = "Parameter") %>%
  full_join(draws_sr_deepfaked_detected %>%
              gather(Parameter, value) %>%
              group_by(Parameter) %>%
              summarize(pd = mean(value > 0)) %>%
              mutate(p = pd_to_p(pd, direction = "one-sided")) %>%
              ungroup() %>%
              select(Parameter, p),
            by = "Parameter") %>%
  select(Parameter, MAP_Estimate, CI_95_lower, CI_95_upper, 
         CI_90_lower, CI_90_upper, p) 

bind_rows(filter(estimates_sr, Parameter %in% c("effect_deepfaked")),
          estimates_sr_deepfaked_detected) %>%
  mutate_at(.vars = c("MAP_Estimate", "CI_95_lower", "CI_95_upper", "CI_90_lower", "CI_90_upper"), round, digits = 2) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE)
Parameter MAP_Estimate CI_95_lower CI_95_upper CI_90_lower CI_90_upper p
effect_deepfaked 2.78 2.63 2.95 2.66 2.93 0
effect_deepfaked_detected 2.74 2.27 3.21 2.36 3.14 0
# hypothesis testing
H4a <- ifelse((estimates_sr_deepfaked_detected %>% filter(Parameter == "effect_deepfaked_detected") %>%
                 pull(CI_95_lower)) > 0, 
              "Accepted", "Rejected")

H4a

In the subset of participants who were shown a Deepfaked video and accurately detected that the video was Deepfaked, the content of the Deepfaked videos (i.e., Source Valence) will influence participants’ self-reported evaluations. Specifically, we will use a Bayesian linear model (model 3) to estimate a 95% Confidence Interval on standardized effect size change in self-reported evaluations between Source Valence conditions in the genuine video condition subgroup. Confidence Intervals whose lower bounds are > 0 will be considered evidence in support of this hypothesis.

  • Result: Accepted

Implicit

Fit model

fit_implicit_deepfaked_detected <-
  brm(formula = IAT_D2 ~ source_valence + (1 | experiment),
      family  = gaussian(),
      data    = data_detectors_subset,
      file    = "models/fit_implicit_deepfaked_detected",
      prior   = prior(normal(0, 10)),
      iter    = 10000,
      warmup  = 3000,
      control = list(adapt_delta = 0.99),  # to avoid divergent transitions
      chains  = 4,
      cores   = parallel::detectCores())

Inspect convergence

summary(fit_implicit_deepfaked_detected)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: IAT_D2 ~ source_valence + (1 | experiment) 
##    Data: data_detectors_subset (Number of observations: 99) 
## Samples: 4 chains, each with iter = 10000; warmup = 3000; thin = 1;
##          total post-warmup samples = 28000
## 
## Group-Level Effects: 
## ~experiment (Number of levels: 3) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     1.04      0.81     0.24     3.26 1.00     5697     9416
## 
## Population-Level Effects: 
##                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept                  0.07      0.67    -1.35     1.52 1.00     6969
## source_valencepositive     1.07      0.19     0.70     1.43 1.00    18710
##                        Tail_ESS
## Intercept                  7143
## source_valencepositive    16473
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.91      0.07     0.79     1.06 1.00    17772    15970
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
plot(fit_implicit_deepfaked_detected, ask = FALSE)

Check informativeness of prior

Using Gelman’s (2019) simple heuristic: For each parameter, compare the posterior sd to the prior sd. If the posterior sd for any parameter is more than 0.1 times the prior sd, then note that the prior was informative.

check_prior(fit_implicit_deepfaked_detected) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE)
Parameter Prior_Quality
b_Intercept uninformative
b_source_valencepositive uninformative

Interpret posteriors

#plot_model(fit_implicit_deepfaked_detected)
plot_model(fit_implicit_deepfaked_detected, type = "pred", terms = "source_valence")

# results table
draws_imp_deepfaked_detected <-
  select(spread_draws(fit_implicit_deepfaked_detected, b_source_valencepositive), b_source_valencepositive) %>%
  rename(effect_deepfaked_detected = b_source_valencepositive)

estimates_imp_deepfaked_detected <-
  map_estimate(draws_imp_deepfaked_detected) %>%
  full_join(bayestestR::hdi(draws_imp_deepfaked_detected, ci = .95) %>%
              rename(CI_95_lower = CI_low,
                     CI_95_upper = CI_high) %>%
              as_tibble(),
            by = "Parameter") %>%
  full_join(bayestestR::hdi(draws_imp_deepfaked_detected, ci = .90) %>%
              as_tibble() %>%
              rename(CI_90_lower = CI_low,
                     CI_90_upper = CI_high),
            by = "Parameter") %>%
  full_join(draws_imp_deepfaked_detected %>%
              gather(Parameter, value) %>%
              group_by(Parameter) %>%
              summarize(pd = mean(value > 0)) %>%
              mutate(p = pd_to_p(pd, direction = "one-sided")) %>%
              ungroup() %>%
              select(Parameter, p),
            by = "Parameter") %>%
  select(Parameter, MAP_Estimate, CI_95_lower, CI_95_upper, 
         CI_90_lower, CI_90_upper, p) 

bind_rows(filter(estimates_imp, Parameter %in% c("effect_deepfaked")),
          estimates_imp_deepfaked_detected) %>%
  mutate_at(.vars = c("MAP_Estimate", "CI_95_lower", "CI_95_upper", "CI_90_lower", "CI_90_upper"), round, digits = 2) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE)
Parameter MAP_Estimate CI_95_lower CI_95_upper CI_90_lower CI_90_upper p
effect_deepfaked 1.39 1.24 1.55 1.26 1.52 0
effect_deepfaked_detected 1.05 0.69 1.42 0.76 1.37 0
# hypothesis testing
H4b <- ifelse((estimates_imp_deepfaked_detected %>% filter(Parameter == "effect_deepfaked_detected") %>%
                 pull(CI_95_lower)) > 0, 
              "Accepted", "Rejected")

H4b

In the subset of participants who were shown a Deepfaked video and accurately detected that the video was Deepfaked, the content of the Deepfaked videos (i.e., Source Valence) will influence participants’ IAT D2 scores. Specifically, we will use a Bayesian linear model (model 4) to estimate a 95% Confidence Interval on standardized effect size change in IAT D2 scores between Source Valence conditions in the Deepfaked video condition subgroup. Confidence Intervals whose lower bounds are > 0 will be considered evidence in support of this hypothesis.

  • Result: Accepted

Behavioural intentions

Fit model

fit_intentions_deepfaked_detected <-
  brm(formula = mean_intentions ~ source_valence, # no random effect for experiment as only exp 6 assessed intentions
      family  = gaussian(),
      data    = data_detectors_subset,
      file    = "models/fit_intentions_deepfaked_detected",
      prior   = prior(normal(0, 10)),
      iter    = 10000,
      warmup  = 3000,
      control = list(adapt_delta = 0.99),  # to avoid divergent transitions
      chains  = 4,
      cores   = parallel::detectCores())

Inspect convergence

summary(fit_intentions_deepfaked_detected)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: mean_intentions ~ source_valence 
##    Data: data_detectors_subset (Number of observations: 24) 
## Samples: 4 chains, each with iter = 10000; warmup = 3000; thin = 1;
##          total post-warmup samples = 28000
## 
## Population-Level Effects: 
##                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept                 -3.27      0.27    -3.80    -2.75 1.00    18627
## source_valencepositive     2.71      0.42     1.88     3.53 1.00    18437
##                        Tail_ESS
## Intercept                 15502
## source_valencepositive    15269
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.99      0.16     0.74     1.36 1.00    16714    15060
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
plot(fit_intentions_deepfaked_detected, ask = FALSE)

Check informativeness of prior

Using Gelman’s (2019) simple heuristic: For each parameter, compare the posterior sd to the prior sd. If the posterior sd for any parameter is more than 0.1 times the prior sd, then note that the prior was informative.

check_prior(fit_intentions_deepfaked_detected) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE)
Parameter Prior_Quality
b_Intercept uninformative
b_source_valencepositive uninformative

Interpret posteriors

#plot_model(fit_intentions_deepfaked_detected)
plot_model(fit_intentions_deepfaked_detected, type = "pred", terms = "source_valence")

# results table
draws_intentions_deepfaked_detected <-
  select(spread_draws(fit_intentions_deepfaked_detected, b_source_valencepositive), b_source_valencepositive) %>%
  rename(effect_deepfaked_detected = b_source_valencepositive)

estimates_intentions_deepfaked_detected <-
  map_estimate(draws_intentions_deepfaked_detected) %>%
  full_join(bayestestR::hdi(draws_intentions_deepfaked_detected, ci = .95) %>%
              rename(CI_95_lower = CI_low,
                     CI_95_upper = CI_high) %>%
              as_tibble(),
            by = "Parameter") %>%
  full_join(bayestestR::hdi(draws_intentions_deepfaked_detected, ci = .90) %>%
              as_tibble() %>%
              rename(CI_90_lower = CI_low,
                     CI_90_upper = CI_high),
            by = "Parameter") %>%
  full_join(draws_intentions_deepfaked_detected %>%
              gather(Parameter, value) %>%
              group_by(Parameter) %>%
              summarize(pd = mean(value > 0)) %>%
              mutate(p = pd_to_p(pd, direction = "one-sided")) %>%
              ungroup() %>%
              select(Parameter, p),
            by = "Parameter") %>%
  select(Parameter, MAP_Estimate, CI_95_lower, CI_95_upper, 
         CI_90_lower, CI_90_upper, p)

bind_rows(filter(estimates_intentions, Parameter %in% c("effect_deepfaked")),
          estimates_intentions_deepfaked_detected) %>%
  mutate_at(.vars = c("MAP_Estimate", "CI_95_lower", "CI_95_upper", "CI_90_lower", "CI_90_upper"), round, digits = 2) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE)
Parameter MAP_Estimate CI_95_lower CI_95_upper CI_90_lower CI_90_upper p
effect_deepfaked 1.37 0.99 1.76 1.04 1.69 0
effect_deepfaked_detected 2.72 1.87 3.51 2.03 3.39 0
# hypothesis testing
H4c <- ifelse((estimates_intentions_deepfaked_detected %>% filter(Parameter == "effect_deepfaked_detected") %>%
                 pull(CI_95_lower)) > 0, 
              "Accepted", "Rejected")

H4c

In the subset of participants who were shown a Deepfaked video and accurately detected that the video was Deepfaked, the content of the Deepfaked videos (i.e., Source Valence) will influence participants’ behavioral intention scores. Specifically, we will use a Bayesian linear model (model 4) to estimate a 95% Confidence Interval on standardized effect size change in behavioral intention scores between Source Valence conditions in the Deepfaked video condition subgroup. Confidence Intervals whose lower bounds are > 0 will be considered evidence in support of this hypothesis.

  • Result: Accepted

Session Info

sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_IE.UTF-8/en_IE.UTF-8/en_IE.UTF-8/C/en_IE.UTF-8/en_IE.UTF-8
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] semTools_0.5-3   lavaan_0.6-7     IATscores_0.2.7  broom_0.7.2     
##  [5] rsample_0.0.7    psych_2.0.9      sjPlot_2.8.4     bayestestR_0.7.5
##  [9] tidybayes_2.0.3  brms_2.14.0      Rcpp_1.0.5       kableExtra_1.3.1
## [13] knitr_1.30       forcats_0.5.0    stringr_1.4.0    dplyr_1.0.2     
## [17] purrr_0.3.4      readr_1.3.1      tidyr_1.1.2      tibble_3.0.4    
## [21] ggplot2_3.3.2    tidyverse_1.3.0 
## 
## loaded via a namespace (and not attached):
##   [1] tidyselect_1.1.0     lme4_1.1-25          htmlwidgets_1.5.1   
##   [4] grid_4.0.2           munsell_0.5.0        codetools_0.2-16    
##   [7] effectsize_0.4.0     statmod_1.4.34       DT_0.13             
##  [10] future_1.19.1        miniUI_0.1.1.1       withr_2.3.0         
##  [13] Brobdingnag_1.2-6    colorspace_2.0-0     highr_0.8           
##  [16] rstudioapi_0.13      stats4_4.0.2         bayesplot_1.7.2     
##  [19] listenv_0.8.0        labeling_0.4.2       huge_1.3.4.1        
##  [22] emmeans_1.4.6        rstan_2.21.2         mnormt_1.5-7        
##  [25] farver_2.0.3         bridgesampling_1.0-0 coda_0.19-3         
##  [28] vctrs_0.3.5          generics_0.0.2       TH.data_1.0-10      
##  [31] xfun_0.19            R6_2.5.0             markdown_1.1        
##  [34] assertthat_0.2.1     promises_1.1.0       scales_1.1.1        
##  [37] multcomp_1.4-13      nnet_7.3-14          gtable_0.3.0        
##  [40] globals_0.13.1       processx_3.4.4       sandwich_2.5-1      
##  [43] rlang_0.4.8          splines_4.0.2        checkmate_2.0.0     
##  [46] inline_0.3.16        yaml_2.2.1           reshape2_1.4.4      
##  [49] abind_1.4-5          modelr_0.1.8         d3Network_0.5.2.1   
##  [52] threejs_0.3.3        crosstalk_1.1.0.1    backports_1.1.9     
##  [55] httpuv_1.5.2         rsconnect_0.8.16     Hmisc_4.4-1         
##  [58] tools_4.0.2          ellipsis_0.3.1       RColorBrewer_1.1-2  
##  [61] ggridges_0.5.2       plyr_1.8.6           base64enc_0.1-3     
##  [64] ps_1.4.0             prettyunits_1.1.1    rpart_4.1-15        
##  [67] pbapply_1.4-2        zoo_1.8-8            qgraph_1.6.5        
##  [70] haven_2.3.1          cluster_2.1.0        fs_1.4.1            
##  [73] furrr_0.2.1          magrittr_2.0.1       data.table_1.13.2   
##  [76] colourpicker_1.0     reprex_0.3.0         mvtnorm_1.1-1       
##  [79] whisker_0.4          sjmisc_2.8.5         matrixStats_0.56.0  
##  [82] hms_0.5.3            shinyjs_1.1          mime_0.9            
##  [85] evaluate_0.14        arrayhelpers_1.1-0   xtable_1.8-4        
##  [88] shinystan_2.5.0      sjstats_0.18.0       jpeg_0.1-8.1        
##  [91] readxl_1.3.1         gridExtra_2.3        ggeffects_0.14.3    
##  [94] rstantools_2.1.1     compiler_4.0.2       V8_3.2.0            
##  [97] crayon_1.3.4         minqa_1.2.4          StanHeaders_2.21.0-6
## [100] htmltools_0.5.0      corpcor_1.6.9        later_1.0.0         
## [103] Formula_1.2-3        RcppParallel_5.0.2   lubridate_1.7.9     
## [106] DBI_1.1.0            sjlabelled_1.1.7     dbplyr_1.4.3        
## [109] MASS_7.3-53          boot_1.3-25          Matrix_1.2-18       
## [112] cli_2.1.0            insight_0.10.0       igraph_1.2.5        
## [115] BDgraph_2.62         pkgconfig_2.0.3      foreign_0.8-80      
## [118] xml2_1.3.2           svUnit_1.0.3         pbivnorm_0.6.0      
## [121] dygraphs_1.1.1.6     webshot_0.5.2        estimability_1.3    
## [124] rvest_0.3.5          snakecase_0.11.0     callr_3.5.1         
## [127] digest_0.6.27        parameters_0.8.6     rmarkdown_2.5       
## [130] cellranger_1.1.0     htmlTable_1.13.3     curl_4.3            
## [133] shiny_1.5.0          gtools_3.8.2         rjson_0.2.20        
## [136] nloptr_1.2.2.2       glasso_1.11          lifecycle_0.2.0     
## [139] nlme_3.1-148         jsonlite_1.7.1       viridisLite_0.3.0   
## [142] fansi_0.4.1          pillar_1.4.6         lattice_0.20-41     
## [145] loo_2.3.1            fastmap_1.0.1        httr_1.4.1          
## [148] pkgbuild_1.1.0       survival_3.1-12      glue_1.4.2          
## [151] xts_0.12-0           fdrtool_1.2.15       png_0.1-7           
## [154] shinythemes_1.1.2    stringi_1.4.6        performance_0.4.6   
## [157] latticeExtra_0.6-29